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Abstract. Rare event simulation and estimation for systems in equilibrium are 
among the most challenging topics in molecular dynamics. As was shown by 
[~~. 1 Jarzynski and others, nonequilibrium forcing can theoretically be used to obtain 

equilibrium rare event statistics. The advantage seems to be that the external 
force can speed up the sampling of the rare events by biasing the equilibrium 
distribution towards a distribution under which the rare events is no longer rare. 
^ ) ■ Yet algorithmic methods based on Jarzynski's and related results often fail to 

Obe efficient because they are based on sampling in path space. We present a 
new method that replaces the path sampling problem by minimization of a cross- 
cntropy-like functional which boils down to finding the optimal nonequilibrium 
forcing. We show how to solve the related optimization problem in an efficient 
■ way by using an iterative strategy based on milestoning. 



(NT 1. Introduction 

>', 



Molecular dynamics (MD) simulations allow for analysis and understanding of the 
dynamical behaviour of molecular systems. However realistic simulations on timescales 
£C) ■ beyond microseconds are still infeasible even on the most powerful general purpose 

computers, which renders the MD-based analysis of many biological equilibrium 
processes, that are often rare compared to the characteristic time scale of the system 
£NJ . and hence require prohibitively long simulations, impossible. The hallmark of these 

rare events is that the average waiting time between the events is orders of magnitude 
longer than the timescale of the switching event itself. Thus rare event simulation and 
estimation are among the most challenging topics in molecular dynamics. 
r> , The molecular dynamics literature on rare event simulations is rich. Since direct 

numerical equilibrium simulation is infeasible, all available techniques try to sample 
from the rare event statistics by biasing the system in one or the other way. Roughly 
speaking, we can distinguish between two major classes of sampling techniques: class 
A consists of splitting methods that decompose state space, but are still essentially 
based on an equilibrium distribution, whereas methods from class B proceed by driving 
the system under consideration into a nonequilibrium regime that changes the rare 
events statistics. For a general overview of Monte-Carlo methods for rare events in 
other application fields, we refer to the textbook [2]. 

The list of methods in class A range from reaction-coordinate based techniques 
via path-space oriented techniques to approaches based on interface sampling or 
generalized dynamics. Reaction-coordinate based techniques consider the marginal 
of the equilibrium distribution in some low-dimensional collective variables like in 



direct free energy calculations [4|; they suffer from the fact that appropriate reaction 
coordinates are often not available. Path-space oriented techniques approximate 
the most important reaction paths that govern the rare event statistics either by 
sampling distribution of reactive paths like in transition path sampling (TPS) [9] [3] 
or by optimizing an appropriate path functional like in the string method [13] ; they 
become problematic if the path space distribution is multi-modal or generally too 
complex (e.g., involving bifurcations). Interface sampling techniques like milestoning 
[14] or forward flux sampling (FFS) [TJ place a set of suitably chosen interfaces in 
state space between the initial and final state and use them to follow the transition 
of the system in an iterative manner using equilibrium trajectories that connect 
neighbouring interfaces. The idea of generalized dynamics such as hyperdynamics [39], 
metadynamics [21], conformational flooding [17] . or the adaptive biasing force (ABF) 
method [7] is to bias the system on-the-fly (e.g., by filling in certain energy wells in 
which the system got trapped during a simulation) so as to enhance rare transitions 
between metastable states. Although seemingly different, generalized dynamics belong 
to class A, in that they only alter the underlying equilibrium distribution along a 
predefined set of low-dimensional collective variables. Although these methods have 
proven to be very efficient, they require that the interesting processes can be described 
by a few collective coordinates that have to be known in advance. 

Class B consists of methods based on the Jarzynski and Crooks formulae 
|21l [5] that relate the equilibrium Hclmholtz free energy to the nonequilibrium work 
exerted under external forcing. Instances of nonequilibrium simulations that mimic 
experiments on controlling and manipulating single molecules (see, e.g., [33] [28] ) 
are single-molecule pulling [19] . steered molecular dynamics [36] or bridge sampling 
[25] , to mention just a few. The corresponding path functionals have the form of 
cumulant-generating functions for the exerted work [231 126] which poses immense 
challenges to Monte- Carlo simulations and limits the usability of the formulae in 
practice. Roughly speaking, the usability is limited by the fact that the likelihood 
ratio between equilibrium and nonequilibrium trajectories is highly degenerate, for 
the overwhelming majority of nonequilibrium forcings generate trajectories that have 
almost zero weight with respect to the equilibrium distribution that is relevant for the 
rare event; cf. also the discussion in [27]. Nevertheless the underlying idea is appealing 
and a cleverly designed external force may speed up the sampling of the rare events by 
biasing the equilibrium distribution of the system towards a distribution under which 
the rare events is no longer rare, while giving numerical estimators that are useful in 
terms of variance and convergence properties. 

The method presented in this article belongs to the latter class, but shares somes 
ideas with ideas from class A. It takes up the idea that external forcings can speed up 
the rare event but avoids sampling issues related to nonequilibrium processes. Instead 
it uses optimal nonequilibrium forcing in connection with splitting methods such as 
FFS or milestoning, in the sense that the new method uses interfaces to follow the 
transition of an optimally driven system where the external forcing that drives the 
system from one interface to the next results in a considerable speed-up compared to 
FFS or milestoning. Specifically, the new method replaces the path sampling problem 
using an exponential change of measure that can be explicitly computed by minimizing 
a cross-entropy-like functional, which then yields the optimal forcing. Although the 
minimization involves solving an optimal control problem, the numerical effort can 
be drastically reduced when the minimization is done in a clever way; one reason is 
that the path functional becomes linear after the change of measure whereas it was 
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exponential in the original cumulant-generating function. 

Transformations based on exponential change of measures have a rich tradition 
in the (risk-sensitive) optimal control literature |20[ El [16] and the theory of large 
deviations [HI HU] , and are regularly rediscovered — mostly aiming at turning certain 
optimal control problems into linearly solvable sampling problems [22, 33 [12]; cf. also 
|37 [ I32 | . Here we pursue the reversed strategy and turn a difficult rare event estimation 
problem into an optimal control problem that can be solved by minimizing a suitable 
functional. Thus the basic outline of the new method is: iteratively determine the 
optimal nonequilibrium forcing by an optimization procedure based on milestoning 
ideas that avoid path-space sampling and compute the equilibrium rare event statistics 
from the optimal nonequilibrium forcing. 

Besides introducing the new method the purpose of this article is to explain the 
basic ideas of how to use optimal control for the estimation and simulation of rare 
events. Therefore we present only the simplest possible scenario (a particle following an 
overdamped Langevin dynamics in a conservative force field) , without paying too much 
attention to complete generality or mathematical rigour. The first issue in Section [2] 
then is to introduce the variational characterization of (generalized) free enregy and 
the exponential change of measures that are the basis of our optimal control approach. 
The precise formulation of the optimal control problem, a stochastic control problem 
with quadratic control costs and an indefinite time horizon, is given in Section [5] In 
Section [4] we describe the numerical method for computing the optimal control, based 
on an inexact gradient descent in connection with a milestoning algorithm, and apply 
it to the controlled first passage between metastable sets. We briefly summarize the 
results in Section [5] and sketch possible generalization that have been omitted for the 
sake of brevity. 



2. A variational characterization of free energy 

We consider a particle with position X t <E M" at time t > which moves in an energy 
landscape V: R™ — > E according to the equation 

dX t = -VV{X t )dt + VTedB t , X Q = x . (2.1) 

Here B t denotes standard n-dimensional Brownian motion, and e > is the 
temperature of the system. Under mild conditions on the energy landscape function 
V we have ergodicity, and the law of X t converges to a unique equilibrium distribution 
with density 

p(x) = Z~ 1 exp(-e" 1 y(x)) , Z= exp(-e _1 F(ic)) dx . 

We assume throughout that the temperature is small, relative to the largest energy 
barriers, i.e., e <C AV^ nax - As a consequence, the relaxation of the dynamics towards 
equilibrium is dominated by the rare transitions over the largest energy barriers. 

Let W be a random variable that depends on the sample paths (X t )o<t< T up to 
a stopping time r. We will call W work in the following. Given some continuous 
function /:R™ x [0,oo) — > K, we suppose that it can be expressed a^J 

W= f f(X t )dt. (2.2) 
Jo 

X The following considerations below are not at all limited to systems of the form 1 12. Ill and path 
functionals like 112.21 1 and can be easily can be easily generalized to, e.g., non-gradient systems with 
multiplicative and/or degenerate noise or observables / that are explicitly time-dependent. 
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Let us further denote by P the probability measure on the space of continuous 
trajectories that is generated by the Brownian motion in (|2.1|) . and let ~E X [-] = 
E[-|Xq = x] be the expectation with respect to P, i.e., the average over all realizations 
of X t starting at Xq — x. We call the quantity 

F(x) = -elogE x [exp(-W7e)] . (2.3) 

the (conditional) free energy of W with respect to P. 

Remark 1. Clearly, the functions and the expectation on the right hand side of 
12. 3\) do not commute, and it follows by Jensen's inequality that F(x) < E X [W], in 
accordance with the second law of thermodynamics. But F encodes information about 
the cumulants of the work W (assuming they exist), namely, 

F{x) = E X [W] + ^E x [{W - E X [W}) 2 ] + ... . 



Remark 2. The similarity between \2.3\) and Jarzynski's formula J21f is no 
coincidence. If t = T is a deterministic stopping time and W is the nonequilibrium 
work done on a system during a transition between two equilibrium states E\ and E 2 , 
then F(Ei) equals the equilibrium free energy difference between E\ and E%. 

The phrases "work" for the quantity W defined in (|2.2|) and "free energy" for F 
as of (|2.3p are just used to relate to Jarzynski's formula. The framework is much more 
general as the following example will show. 

Guiding example. One example, of which we will consider variants below, is the 
first hitting time of a subset of state space. To this end let S C K™ a set and define 

r = inf {t >0:X t eS}. 

to be the first time at which Xt hits S. Choosing the constant function / = a in (|2.2|) . 
the free energy 

F a {x) = -elogE a: [exp(- ( TT/e)] . 

considered as a function of a is the scaled cumulant-generating function of r when X t 
is started at Xq = x. In particular, we can compute the mean first hitting time by 

dF 

e ^L =E*[t]. 
2.1. Relative entropy and change of measures 

The strict convexity of the exponential function implies that equality F(x) = E X [W] 
is only attained if W is P-almost surely constant; one such case is the adiabatic limit 

W= lim )- I f{X t )dt. 

T^oo T J Q 



We will restore (|2.3p to an expression that becomes linear in W after a suitable 
change of measure. To this end let Q denote a probability measure on the space 
of continuous trajectories that is absolutely continuous with respect to P (i.e., 
tp = dQ/dP exists). We define the relative entropy of Q with respect to P as 

I(Q\\P)= [ logfgW (2.4) 
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(This is also called the Kullback-Leibler divergence.) We declare that I(Q\\P) = oo if 
Q is not absolutely continuous with respect to P. Then, by Jensen's inequality, 

F(x) = -elog-E x [exp(~W/e)} 

= -e log W Q [exp(- W/e - log <p)] (2.5) 

<W Q [W]+eI{Q\\P), 

where we have used the notation Eq[«] to denote the expectation with respect to Q. 
The last inequality that appears in the literature in various forms as second-law-like 
identity or generalized Jarzysnki inequality (cf. [351 118] ) suggests that the free energy 
and the relative entropy are related by a Legendre-type transformation, viz., 

F(x)=M{W Q [W]+eI(Q\\P)}, 

and a result in [6] implies that the infimum exists and is attained when Q runs over 
all path measures that are absolutely continuous with respect to P. By the strict 
convexity of the exponential function, the latter implies that W + e log tp is Q-almost 
surely constant. 

The idea of the approach sketched below then is to represent Q in terms of 
suitable (parametric) control variables and minimize the right hand side of (|2.5[) over 
all admissible controls. 



3. An optimal control problem 

The aim of this section is to derive necessary and sufficient conditions for the optimal 
change of measure that turns (|2.5j) into an equality. To this end we follow ideas by 
Fleming and co-workers [151 110] and consider the exponential cost functional: 



1>(x) = E a 



exp^-e" 1 ^ f(X s )ds 



(3.1) 



For a stopping time r that is the first hitting time of a set S C K™, the Feynman-Kac 
formula |31) implies that tp solves the elliptic boundary value problem 

eL^ = f^, ^|as = l, (3.2) 

where 

L = eV 2 +W -V . (3.3) 

is the infinitesimal generator of X t , defined on a suitable subspace of L 2 (R"). We 
want to transform the boundary value problem (|3.2j) into an equation for the unknown 
control variable in (|2.5|) . For this we proceed in two steps. 



Step 1: We can safely assume that r is almost surely finite. As a consequence, the 
function ip in (13.1j) admits a formal representation of the form 

ip = cxp(— F/e) . 

We seek an equation for the free-energy F. By chain rule, it follows that 

eexp(^/e) J Lexp(-F/e) = -LF + \VF\ 2 , 
which entails that Q3.2]) is equivalent to 

LF-|VF| 2 + / = 0, F\ ds =Q, (3.4) 

The last equation is known as the Hamilton- Jacobi-Bellmann (HJB) equation of 
optimal control [16] : its solution is called value function or optimal cost-to-go. 



5 



Step 2: To reveal the stochastic optimal control problem that corresponds to the 
HJB equation (|3.4j) . we first note that 



|VF| 2 = min \V2c-\7F +-\c\ 2 \ 

cSR" [ 2 J 



from which we recognize that (|3.4[) is equivalent to 

min {L(c)F + g(x,c)} = 0, F\ ds = , (3.5) 

ceR" 

with the shorthands 

ff (x,c) = /( 2 ;) + i| C | 2 

and 

L(c) = eV 2 + (V2c- W) -V. 

Equation p.5[) is the Hamilton- Jacobi-Bellman equation of the following optimal 
control problem that should be compared to the right hand side of (12.51) : minimize 



I(u) = E 



g(X t ,u t )dt 



o 



(3.6) 



over an admissible set U of control laws u with values in W 1 and subject to the tilted 
dynamics 

dX t = (y/2ut - VV(-Xt)) * + V2^ dB t . (3.7) 

That is, the expectation in ()3.6j) has to be taken wrt the path measure Q generated 
by the dynamics given by (|3.7p . 

Remark 3. The dynamics that generates the new path measure Q is again of gradient 
form if u = u* is the optimal Markovian feedback control, i.e. when Q = Q(u*). As 
a consequence, the optimally controlled process satisfies detailed balance i26\j . Indeed, 
since 13. 6\) is quadratic and l{3.7\ ) is affine in the control, the minimizer 

c*(x) = axgnxm{L(c)F + g(x, c)} , 

c 

in US. 5\) is unique (provided that F is sufficiently smooth). The optimal feedback law 
is then given by u* t = -V2VF(X t ) and gives rise to the tilted dynamics 

dX t = -VG(X t )dt + V2~edB t , X t eR n \S, 

with the tilted potential 

G{x) = V(x)+2F(x) . 

Guiding example, cont 'd. In some cases it is helpful to pursue a reverse strategy and 
transform the nonlinear HJB equations of an optimal control problem into a linear 
equation that may be easier to solve (cf. [22l [38] ). 

Consider a Brownian particle under a microscope with a moveable object holder. 
Let Del 2 denote the microscope's focal disc, X t € K 2 the particle position at time 
t > 0, relative to the position of the object holder, and u t the motor force. The control 
task is to move the object holder such that the particle stays in the focus as long as 
possible. Hence the control objective is the maximization of the mean first exit time 
from D which amounts to minimizing the cost functional 

' ' \u t \ 2 dt 



I(u) = E 



2 
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subject to 
Let 



dX t = V2u t + V2edB t 
F(x 



min E x 



1 

T+ 2 



\uA 2 dt 



be the value function (free energy) of the problem and 
i/>(x) = E K [exp(r/e)] . 

Then the linear boundary value problem for tp — exp(—F/e) is a Helmholtz equation 
with Dirichlct boundary conditions, 

e 2 V 2 V + ^ = 0, il>\ 8D = l, 
which can be solved by standard means. 

4. Greedy milestoning algorithm 

At hrst sight it seems that we have not gained much, for we have transformed 
the original path sampling problem into a complicated nonlinear optimal control 
problem. However the optimal control formulation opens up other options for the 
numerical treatment of the rare event sampling in terms of a minimization problem. 
Another advantage is that it is relatively easy to construct unbiased estimators of the 
control functional, avoiding both bias and variance issues when estimating exponential 
observables such as 



Discretization Together with the information that the optimal Markov control is of 
feedback form our minimization problem (I3.6I) - (|3.7I) takes the form 



Fix) = min En 

u t =c(X t ) W 



g(X u u t )dt 



with Q denoting the path measure generated by the dynamics given by (|3.7j) . We 
discretize this optimization problem by choosing a finite dimensional ansatz space 
for the space of admissable feedback functions c: We choose sufficiently smooth and 
integrable vector fields bf.W 1 — > R n , j = 1, . . . , m, so that 



c(x) — a,jbj(x) , a,j € K , 

3=1 

or, respectively, we choose scalar ansatz function vj:] 



I, j = 1, . . . , m, so that 



3=1 

The minimization problem then amounts to minimizing the cost functional 



1(a) = E Q 



£ f/(X s ) + i|^a i ( s )& i (X s 



ds 



(4.1) 



over the unknown coefficients a = (a l7 . . . , a m ) where Q = Q(a), the path measure 
of the controlled diffusion (13.71) also depends on the coefficients; for the moment we 
remain with the imprecise statement that the measure Q has a density </?(•; a) with 
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respect to a (fictitious) uniform measure on the space of all continuous paths in R n , 
which is a function of the unknown coefficients!?! 



Gradient descent We minimize the cost functional 1(a) by a doing a gradient descent 
in the coefficient vector a = (a±, . . . , a m ). Specifically, we iterate the map 

a ( i + 1 ) = aW-a i V7(a«), 

where i is the iteration index and (aj)j>i is a bounded sequence of stepsizes for the 
gradient search. For instance, we can do a line search in the descent direction and 
determine a% so that it satisfies the Wolfe condition [30] . Details of the iteration that 
is based on an Euler-Maruyama discretization of the path measure Q will be given 
below in the appendix. The overall algorithm thus has the following steps: 

• Choose scalar-valued ansatz functions Vj with support in the interesting region 
of state space and related vector fields bj = —y2VVj. 

• Choose initial coefficients a*- ' = (a^) such that the free energy or value function 
Sjli a j v j( x ) fiH s U P tn e main wells in the energy landscape V. 

• Iterate the following steps in i, starting with i — 0, until a prescribed termination 
criterium is satisfied: 

(i) Sample the path measure Q = Q(a^) and evaluate V/(a^^) (see formula 
(|A.4|) in the appendix). 

(ii) Perform a gradient descent a^ +1 ^ = — ctiVl(cS 1 }). 

Remark 4. The gradient search algorithm can be regarded as a variant of the cross- 
entropy method that is a relatively new Monte- Carlo technique for the sampling of rare 
events which goes back to Rubinstein and others [3^]. It is based on the idea that an 
optimal change of measure can be found by minimizing the Kullback-Leibler divergence 
\2.4\ ) over a family of probability measures Q in terms of the tilting parameter c. 
Compared to equilibrium rare event simulation algorithms used in molecular dynamics 
using the optimal change of measure has the advantage that the likelihood ratio dQ/dP 
stays of order one, while rare events under the original dynamics (here: diffusion in 
an energy landscape V) are no longer rare under the forced dynamics \3. 7j ). As a 
consequence, sampling the path measure Q is significantly more efficient than sampling 
the original path measure P since the trajectories to be sampled from Q are much 
shorter on average (i.e., the expected hitting time is considerably shorter). 

Milestoning algorithm For problems with a large state space or for strongly 
metastable systems, the above algorithm may still be inefficient since sampling 
the path measure Q may involve many rather long trajectories. In this case the 
computation can be broken down to transitions between neighbouring interfaces as in 
milestoning [T3] or in FFS pQ. We explain the basic steps of this procedure: Let 

F(x) = minE; 

a 

§ More precisely, Q = Q* is the probability to find paths (^ s )o< s <T i n a small tube around a 
smooth curve 7 : [0,T] -> R n , i.e., Q s x (^) = P(\\X S - -y(s)\\ < S | X = x). By the Girsanov theorem, 
Q x = lim5_>o Qx has a density <p = exp(— 5(7)) with respect to the Gaussian measure induced by 
the Brownian motion B s = x + \/2eB 3 , where 5(7) is the Onsager-Machlup functional 




g{X s ,c(X s ))ds 
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Figure 1. Illustration of nesting of sets for the milestoning iteration. 



denote the semi-discretized value function of the problem, with the shorthand 



Suppose that S = So is the set of interest and t = tq is the first hitting time of So; 
we now choose nested sets or milestones So C Si C S2 C . . . (cf. Figure [J) . We first 
compute F in Si \ So by finding the optimal control policy c in Si \ So- That is, our 
ansatz functions in the above gradient descent algorithm only have to be non- vanishing 
in Si \ So- In particular this gives F on dSi, the outer boundary of Si \ So- We can 
repeat the same algorithm in the set S2 \ Si; then letting 1 e S 2 \ Si and letting n 
denote the first entry time into Si , we have 



where X Tl 6 <9Si C Si \ So for which F has been computed in the previous step. By 
iterating the algorithm we eventually obtain F on all set boundaries dSi, i = 0,1,2, .. .. 
Thus, the milestoning iteration can be implemented as an outer loop which contains 
the above gradient descent algorithm in every of its iterations. 

Remark 5. The milestoning variant of the gradient descent algorithm only requires 
the computation of an ensemble of short trajectories of the controlled system \3. 7| j. 
Here "short" means that they are orders of magnitude shorter than those in typical 
path-space sampling algorithms like TPS, and equilibrium milestoning or FFS. 

4--1- Guiding example: computing the mean first passage time 

We consider the uncontrolled dynamics (|2.1[) with the one-dimensional potential shown 
in Figure [2] Suppose we are interested in computing the mean first passage time to 
the set S = [—1.1,-1] in terms of the free energy (12.31) . Let 



be the first hitting time of S, consider the constant function / = a, and the scaled 
moment generating function 



considered as a function of a. The quantity of interest is the mean first passage time 
of the uncontrolled dynamics, 




j 




r = inf{t > 0: X t € dS} . 



ip a {x) = E a: [exp(-crT/e)], 



dip a 



-1.5 -1 -0.5 0.5 1 1.5 

x 

Figure 2. Skew double-well potential V. 
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Figure 3. Reference solution for the uncontrolled mean first passage time (left 
panel) and the related free energy F a for a = 1 (right panel). Results based on 
finite element discretization of 1 13.211 with high precision. 

for e = 0.5. 

In order to obtain a reference solution with high accuracy we first compute ip a 
by discretizing the elliptic boundary value problem (|3.2I) based on a standard finite 
element discretization on a fine grid. This is possible because the state space dimension 
in this guiding example is small but will not be possible in realistically high dimensions. 
The resulting reference solution for E x [t] is shown in the left panel of Figure |3] below, 
along with the associated free energy F a (x) = —e\ogip a (x) in the right panel. 

An approximation of the free energy was then computed by the greedy milestoning 
/ gradient descent algorithm described above that minimizes the cost functional (|4.ip 
in the coefficients a — (a\, . . . , a m ). As scalar ansatz functions Vj we chose to = 10 
Gaussians with width 0.1 whose centers where uniformly spaced in the complement of 
S. Once the minimization had been converged, the value function (free energy) and 
the resulting optimal control law were given by 

m tci 
3=1 2=1 

with bj = — VSVwj. The result agree with the reference solution shown in Fig. [3] 
(deviations are of the order of the accurarcy threshold used in the gradient descent 
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Figure 4. Optimally tilted potential potential (left panel) and the first If iterates 
of the gradient descent (right panel) . 



algorithm). Figure @] shows the resulting optimally tilted potential G = V + 2F, 
together with first few iteration steps of the gradient search. The mean first passage 
time of the tilted system 

dX t = -VG{X s )dt + V2ldB s , (4.2) 

i.e., with V in (|2.ip replaced by the new potential G, is shown in Figure [SJ 

As has been outlined above the algorithm only requires the computation of rather 
short trajectories since for all iterative potentials the mean first passage time is orders 
of magnitude smaller than for the original dynamics; the mean first passage time of 
the optimally tilted potential, e.g., is around 100 times smaller than originally. 

5. Conclusions and outlook 

We have developed a simulation scheme for rare events that is based on an optimal 
change of measure that boils down to a logarithmic transformation of the path 
functional under consideration. The measure transformation turns the original 
exponential path functional into the functional of an optimal control problem that 
is linear in the observable and quadratic in the control variables. Although analytic 
solutions to the optimal control problem are available only in simple situations and 
computing the optimal change of measure may require to solve a possibly high- 
dimensional optimal control problem numerically, there is a considerable speed-up 
coming from (a) the fact that the functional is linear-quadratic and allows for the 
design of robust unbiased Monte-Carlo estimators and (b) the fact that events that 
were rare originally are no longer rare under the new probability measure. The gain 
in the numerical complexity requires that the optimal control problem can be solved 
efficiently, and, with the equivalence between path sampling and optimal control in 
hand, we have sketched a numerical algorithm for computing the optimal control 
that is based on an easy-to-implement inexact gradient descent that can be solved 
rather efficiently using milestoning. The algorithm was tested, computing the optimal 
feedback for the controlled passage between metastable sets in a double- well potential. 
Even though the numerical example that we presented is tiny on the scale of typical 
molecular dynamics applications, we emphasize that the minimization algorithm is 
independent of the dimension of the system and hence admits an easy generalization 
to more complicated systems; we refer to the rich literature on machine learning 
and queuing networks where various strategies for treating high dimensional systems 
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have been developed (e.g., see |5j). Finally we note that all ideas presented in 
this article can be readily extended to more complicated dynamics (e.g., degenerate 
diffusions with dissipation) and time-dependent path functionals (e.g., to simulate 
single- molecule experiments); it is even possible to consider situations where the 
exponential path functional involves additional control variables, in which case a 
logarithmic transformation leads to a game rather than an optimal control problem 
(cf. [25]). Further open issues are the deterministic limit of the stochastic control 
problem, the convergence analysis of the gradient descent and the rigorous analysis of 
fluctuations in systems under feedback control (cf. [35]). 



Appendix A. Computational aspects 



In order to compute the gradient of (|4.ip with respect to to the unknown coefficients 
a = (a\, . . . ,a m ), it is convenient to discretize the path measure Q — Q(a). To this 
end, let = t < t\ < . . . < tjy = r be a set of time nodes with h = t k+ i — t k where 
we assume for the moment that r < oo is deterministic. Euler's method applied to 

dX t = (V2c(X t ) - Wpf t )) dt + V2^dB t . 

gives 

X k+1 =X k + h (V2c{X k ) - W(X fe )) + V2fe77 fe+ i 

where the r\ k are i.i.d. random variables that are normally distributed with mean zero 
and unit covariance. Since the r\ k are Gaussian, the density of the distribution Qh{a) 
of discrete paths (Xq, . . . , Xn) C K™ conditional on Xq = xo is readily shown to be 

(fih(x , ...,x N ;a) = (Z^a))- 1 exp (~S h (x , x N ; a)) (A.l) 

with the discrete action 



L N-l 
k=Q 



X k+1 - X k 



VV{x k ) - v / 2c( 



x k , 



2 



(A.2) 
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and the normalization constant 

Z h = exp(-Sh(x ,---,x N ;a))dxi...dx N - 

Jl»x...xl" 

Computing the gradient of the discretized functional 

~N-1 



(A.3) 



I h (a;X )=-E% 



Y,g h (X k ,c(X k )) 



k=0 



with 



g h (x,c{x)) = h^f(x) + ^Y^ajbj(x) j 



is now straightforward. Assuming that Xq is independent of the control, we have 

rjv-i 



dcij 



Qh 



^ da , 

k=0 3 



dS h 1 dZ h 
dcij Zh dcij 



where both and dgh/ daj are evaluated at (x k , c(x k )). Specifically, 
dgh 



da 3 

dSh 
da 3 

dZ h 
da-j 



hc(xk,t k )bj(x k 



N—l 



" ^ Xk+1 ~ Xk 



^ to V 



+ VV(x k ) - V2c(x k ^j bj(x k ) 



dS h 



da -i 



Together with the projection property of the conditional expectation this gives 



9Jh = E x 



dgh 



daj 



9h, 



dSh 
daj 



(A.4) 



where C>Q h denotes the covariance operator 

CQ h [«, »] = E Qh [u«] - E Qh [u]E Qh [v]. 



Inexact gradient We are interested in the situation when r in (|3.6[) is a random 
stopping time rather than a fixed time; otherwise the optimal control policy would 
be a function of time, i.e., u t — c(X t ,t). But in case that r is a first entry time of 
a set S C R n , this stopping time r = r(c) will be a function of the control. Hence 
the derivative of the cost functional with respect to the unknown control coefficients 
aj would involve additional derivatives of r or its time-discrete counterpart N T ; for 
example, for the discretized running cost this would result in an expression like 



TT~ Y] 9h(xk,c(x k )) = g h (x N T.-i,c(xN r -i))® lK 



da^ 



k=0 



daj 



N T -1 ~ 

E dgh 
daj 

fe=0 J 



In principle the dependence of the stopping time on the control variable can be 
made explicit in terms of the solution to an elliptic boundary value problem for r, yet 
it is unclear how terms such as dN T /daj can be handled numerically efficiently. 

In many cases the gradient descent will also converge even though the gradient 
VJ is not exact, and it turns out that the boundary cost in the last equations is 
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typically small compared to the accumulated cost. Ignoring the contribution from 
the boundary terms in the derivatives hence gives a gradient descent method with 
inaccurate gradient. In our numerical example where / = a is constant, the inexact 
gradient reads 

(x k )bj(x k ) 

■n t -i N T —1 
^ (ct + -|c(X fc )| 2 ), Vk+ibj(x k ) 

. k=0 fc=0 

(x k )bj(x k ) 

/N T -1 \ N T -1 

^(a + -| C (X fe )| 2 ) J2 Vk+Mx,) , 

, \ fc=0 / fc=0 

where N T = \t/K\ is the discrete analog of the first hitting time (here \x~\ is the nearest 
integer larger than x), and we used the fact that bj(X k ) and r] k+ i are independent. 
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